Skip to content

Add "User Group Diagnostics" Grafana dashboard #149

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jun 10, 2025

Conversation

jnywong
Copy link
Contributor

@jnywong jnywong commented May 19, 2025

This PR adds a new "User Group Diagnostics" Grafana dashboard that complements the "User Diagnostics Dashboard" to show resource usage aggregated by user group.

Requires jupyterhub-groups-exporter to be set up on the hub for the dashboard to work, but if not, then see below.

For this to universally work across differently configured hubs, I decided to separate the "User Group Diagnostics" Grafana dashboard entirely from "User Diagnostics Dashboard". This is because if I combine both labels username and usergroup into one dashboard/PromQL query, then this query would break down for hubs that do not have jupyterhub-groups-exporter set up because the jupyterhub_user_group_info metric is unavailable. Therefore, if a hub does not have jupyterhub-groups-exporter set up, then the "User Diagnostics" dashboard will work as normal but the "User Group Diagnostics" dashboard will show no data.

The "User Diagnostics" dashboard included in this PR differs from the existing version of user.jsonnet, because that is technically a "Pod Diagnostics" dashboard. This PR aggregates pod-level data on a per user basis and uses unescaped usernames from the metric kube_pod_annotations, rather than usernames with limited character-sets derived from kube_pod_labels.

Note

Metrics are available as a time series from the date of initially deploying the jupyterhub-groups-exporter service (therefore some PromQL related to usergroup in this PR would be invalid prior to deployment). If you see an execution error in the dashboard, try selecting a more recent time window when the service is in operation.

Ref: 2i2c-org/infrastructure#6065

Copy link
Member

@GeorgianaElena GeorgianaElena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @jnywong! This is looking very good! 🚀 ❤️ My only suggestion is maybe to expand a bit more the panel description?

Ideally from this description we would then generate some reference docs #94.

@jnywong
Copy link
Contributor Author

jnywong commented Jun 6, 2025

Cool! I have updated the descriptions, and more docs and detail will follow in https://github.com/2i2c-org/jupyterhub-groups-exporter that was referenced.

@jnywong jnywong requested a review from GeorgianaElena June 10, 2025 09:33
@GeorgianaElena
Copy link
Member

Thank you @jnywong, this is perfect!

@GeorgianaElena GeorgianaElena merged commit 67071c1 into jupyterhub:main Jun 10, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants